import rodin
The Rodin class requires two data files: a features table and a class labels file.
feat_stat must be set to 'ann'.)To create an object of the Rodin class using these files, use the following command:
obj=rodin.create_object_csv("./data/features.csv","./data/class_labels.csv",
feat_sep=',',class_sep=',')
Once the Rodin object is created, it has three attributes:
X: This attribute holds the intensities data from the samples.
features: This contains the metabolites features data, including 'mass to charge' ratios and 'retention times'.
samples: This attribute includes the classes data, which correspond to various characteristics of the samples such as treatment groups, age, etc.
print(obj.samples)
Sample ID Dose Sex 0 N10 Control Female 1 H3 High Female 2 N23 Control Female 3 N21 Control Female 4 H9 High Female .. ... ... ... 66 H15 High Male 67 H16 High Male 68 H21 High Male 69 H27 High Male 70 H22 High Male [71 rows x 3 columns]
The transform function offers a convenient method for preprocessing data, comprising several key steps:
Imputation: Fills in missing data points within the dataset.
Filtering Features (thresh): Eliminates features that have more than a specified threshold of missing values. This step ensures the quality and reliability of the dataset by removing less informative features.
Data Normalization (norm): Normalizes the data using one of two methods:
Log2 Transformation (log): Applies a log2 transformation to the normalized data. This transformation is useful for stabilizing variance across the dataset and making the data more suitable for linear models and other statistical analyses.
These preprocessing steps are crucial for preparing the data for further analysis, ensuring that it is clean, normalized, and ready for accurate and meaningful interpretation. However, you are able to skip particular step by setting its parameter to None.
obj.transform(thresh=0.5,norm='q')
Number of features filtered: 1972
< Rodin object > dim: 24233 X 71
After the data in the X attribute has been transformed, statistical tests can be performed. In this basic guide, we will focus on using one-way ANOVA (Analysis of Variance).
To perform one-way ANOVA using the Rodin class:
samples attribute as an argument. This column should represent the different groups or classes among your samples.Following the execution of one-way ANOVA, the features attribute of the Rodin object will be updated. It will now include data about the statistical tests performed. The p_adj column displays the p-values adjusted using the Benjamini-Hochberg method.
obj.oneway_anova('Dose')
| mz | time | p_value(owa) Dose | p_adj(owa) Dose | |
|---|---|---|---|---|
| 0 | 85.0284 | 38.3 | 0.867775 | 0.929819 |
| 1 | 85.0285 | 213.1 | 0.003990 | 0.036638 |
| 2 | 85.0285 | 213.1 | 0.817019 | 0.900804 |
| 3 | 85.0285 | 69.4 | 0.960532 | 0.979629 |
| 4 | 85.0285 | 69.4 | 0.006463 | 0.049960 |
| ... | ... | ... | ... | ... |
| 26200 | 1245.6035 | 276.0 | 0.329061 | 0.536328 |
| 26201 | 1253.8981 | 29.2 | 0.000134 | 0.003188 |
| 26202 | 1255.9144 | 29.4 | 0.000485 | 0.008308 |
| 26203 | 1274.1295 | 40.1 | 0.154725 | 0.343972 |
| 26204 | 1274.1842 | 39.9 | 0.008926 | 0.060915 |
24233 rows × 4 columns
The fold_change function is recommended for pathway analysis as it calculates the log fold change difference between classes in the dataset. The function uses the first class appearing in the specified column of the samples attribute as the reference, if it is not provided as parameter.
obj.fold_change('Dose')
| mz | time | p_value(owa) Dose | p_adj(owa) Dose | lfc (High vs Control) | lfc (Low vs Control) | lfc (others vs Control) | |
|---|---|---|---|---|---|---|---|
| 0 | 85.0284 | 38.3 | 0.867775 | 0.929819 | -0.092725 | -0.046480 | -0.069603 |
| 1 | 85.0285 | 213.1 | 0.003990 | 0.036638 | 0.128757 | -0.255070 | -0.063157 |
| 2 | 85.0285 | 213.1 | 0.817019 | 0.900804 | -0.052215 | 0.080238 | 0.014011 |
| 3 | 85.0285 | 69.4 | 0.960532 | 0.979629 | -0.042041 | 0.012479 | -0.014781 |
| 4 | 85.0285 | 69.4 | 0.006463 | 0.049960 | -0.140702 | 0.161327 | 0.010312 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 26200 | 1245.6035 | 276.0 | 0.329061 | 0.536328 | 0.253545 | 0.079541 | 0.166543 |
| 26201 | 1253.8981 | 29.2 | 0.000134 | 0.003188 | 0.394721 | 0.792212 | 0.593466 |
| 26202 | 1255.9144 | 29.4 | 0.000485 | 0.008308 | 0.220599 | 0.734591 | 0.477595 |
| 26203 | 1274.1295 | 40.1 | 0.154725 | 0.343972 | 0.286580 | 0.063715 | 0.175147 |
| 26204 | 1274.1842 | 39.9 | 0.008926 | 0.060915 | 0.185690 | -0.249075 | -0.031692 |
24233 rows × 7 columns
With the same approach the following tests are available:
ttest('categorical')
pls_da('categorical')
twoway_anova(['cat1','cat2'])
sf_lg('binary') - Logistic Regression
sf_lr('continuis') - Linear Regression
rf_class('categorical') - Random Forest Classifier
rf_regress('continuis') - Random Forest Regressor
Additonal parameteres as degree, moderator or n_estimators could be provided.
Plotly library is used by default, to get static plots set interactive parameter to False (except of volcano plot)
Results of statistical tests could be visualized using volcano function.
obj.volcano(p='p_adj(owa) Dose', effect_size='lfc (High vs Control)', sign_line=0.00001, effect_size_line=[-1,1])
Metabolites of interest could be plotted using boxplot, violinplot and regplot providing the raw index numbers (indexes from the original data are saved in the Rodin object)
obj.boxplot(rows=[9999,4561],
hue='Dose',category_order=['Control','Low','High'])
To understand the patterns in the data - clustergram function could be utilitized:
obj.clustergram(hue='Dose',standardize='row',width=1200)